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ABSTRACT 

The determination of galaxy merger fraction of field galaxies using automatic morphological 
indices and photometric redshifts is affected by several biases if observational errors are not 
properly treated. Here, we correct these biases using maximum likelihood techniques. The 
method takes into account the observational errors to statistically recover the real shape of the 
bidimensional distribution of galaxies in redshift - asymmetry space, needed to infer the redshift 
evolution of galaxy merger fraction. We test the method with synthetic catalogs and show its 
applicability limits. The accuracy of the method depends on catalog characteristics such as the 
number of sources or the experimental error sizes. We show that the maximum likelihood method 
recovers the real distribution of galaxies in redshift and asymmetry space even when binning is 
such that bin sizes approach the size of the observational errors. We provide a step-by-step guide 
to applying maximum likelihood techniques to recover any one- or bidimensional distribution 
subject to observational errors. 

Subject headings: Data Analysis and Techniques 



1. INTRODUCTION 

The currently popular hierarchical ACDM 
models are successful at explaining the struc- 
ture build-up of the cold dark matter c ompo- 
nent of the Universe (jSpringel et al. 2005). But 
such models have difficulties when explaining the 
evolution of the baryonic component, even with 
modeling that incorporates star formation, ac- 
tive galactic nuclei and supernova feedback, and 
the multiphase nature of the interstellar medium 
( De Lucia &: Blaizotll2007 . and references therein). 
An open question is the role of the galaxy merg- 
ers in the formation of today's galaxies, specially 
the most massive ellipticals. The observational 
determination of the merger rate, 3? m , and its 
evolution with redshift, provide empirical clues on 
the amount and the timing of the merger activity. 
They also constitute key inputs for semi-analytic 
models of galaxy formation and evolution. 

The merger rate, defined as the number den- 
sity of merger systems at given redshift, depends 



cs- 



on the merger time r m , which can only be 
timated by N-body simulations and simplified 



mode ls (|Mihoslll995l;|Patton et alJl2000HConselicel 
2006) . On the other hand, the galaxy merger frac- 
tion /gm, defined as the number of merger galaxies 
in a given galaxy sample in a redshift interval, is 
a direct observational quantity. Many works have 
determined the galaxy merger fraction, usually 
parametrized as / gm = / gm .o ■ (1 + z ) m , using dif- 



logical criteria (IConselicell2003; 


Laverv et al. 2004: 


Cassata et al. 2005; Lotz et al. 


2008; Bridge et al. 


2007; De Propris et al. 


120071) , kinematic close 



companions (IPatton et alj l2000l l2002t iLin et al 



2004 |Pe Propris et al.l l2005l 120071), spatia l close 



dLe Fevre et all l200o[ iBundy et al.l 
iBridee et al.l 120071 ; iKartaltepe et a.l.ll2007t ) or cor- 



pairs 



2004; 



relati on function ( Bell et al. 2006uMasiedi et~al 
2006). In these works the value of the merger in- 
dex varies in the range m = — 4. ACDM models 



predict m ~ 3 (iKolatt et al . 1999; iGovernato et al 



19991 : iGottlober et al.ll200lh 
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The morphological criteri on for determini ng; the 
galaxy merger fraction (see Conselicel 2003 , here- 
after C03), is based on the fact that, just after a 
merger is complete, the galaxy image shows strong 
geometrical distortions, in particular asymmetric 
distortions. Hence, h igh values in the auto matic 
asymmetry index A ([Abraham et al. 1996; C03) 
are assumed to identify merger systems. Other 
automatic morphological indices, such as M20 and 
G, have also been used to determine th e evolution 



of ga laxy merger fraction with redshift ( Lotz et al 
2008). The determination of morphological in- 
dices, which must be done on HST images, is 
affected by surface brightness dimming and K- 
corrections, so the errors of the indices grow with 
redshift and are more important for faint galaxies. 

In this paper, we present a method based 
on the maximum likelihood (ML) technique, 
to handle the effects of the large errors on 
the determination of the galaxy merger frac- 
tion. Galaxy Merger fraction determinations us- 
ing morphological criteria are generally d one on 
large photometric s urveys such as AEGIS ( Davisl 
20071). COM B O-17 (IWolf et al.l l2003l). COSMOS 
(IScoyille et all 120071) GOODS (iGiavalisco et al 



200j), or SWIRE (|Lonsdale et all l2003f) . We 
therefore address the effects of errors in the galaxy 
asymmetry indices as well as errors on the photo- 
metric redshifts. 

In Section [5] we review the maximum likelihood 
method for determining bidimensional distribu- 
tions. Its application to the galaxy merger frac- 
tion determination is given in Section 12.21 These 
sections have a high mathematical content, and a 
statistics background is recommended. Then, in 
Section [3] we summarize the simulations made to 
test the general method and how it improves the 
galaxy merger fraction determination, Section T3. 71 
In Section [4] we provide an outline for the appli- 
cation of the ML method to any one- or bidimen- 
sional experimental distribution subject to obser- 
vational errors. Our conclusions are presented in 
Section [5l 

2. METHODOLOGY 

Following Conselice] ( 20061 ). we define the 
galaxy merger fraction by morphological criteria 
as 

f - K ' Nm m 



where N m is the number of the distorted sources in 
the sample, classified as the systems with a value 
in the asymmetry index A higher than a limiting 
value A m (see C03 for details), N to t is the total 
number of sources in the sample, and K is the av- 
erage number of galaxies that merged to produce 
the N m merger systems. We use k = 2 throughout 
this paper. 

In order to compute the galaxy merger fraction 
and its redshift evolution we must know the un- 
derlying distribution of the z and A values, that 
we assume is represented by a bidimensional his- 
togram in redshift and asymmetry space. This 
bidimensional histogram is defined by the number 
of sources in each redshift-asymmetry bin. Nor- 
malizing to unity the histogram yields a bidimen- 
sional probability distribution defined now by pki , 
the probability that a source has redshift in bin k 
and asymmetry in bin /. Index k scans the redshift 
bins of size Az and index I scans the asymmetry 
bins of size AA. In our case we just need two 
asymmetry bins separated by A m : the I — bin 
represents normal sources and the I = 1 bin rep- 
resents merger systems. Now, the galaxy merger 
fraction in redshift bin [zk, Zk+i) is 



/gm,fc 



2p. 



Pko + 2p, 



kl 



(2) 



N tot + { K -l)N n 



The accuracy with which the pki can be ob- 
tained degrades significantly when photometric 
redshifts, z p hot, are used, and for typical errors 
of A in deep HST surveys. This introduces strong 
biases in the determination of the galaxy merger 
fraction. 

2.1. The maximum likelihood method 

The maximum likelihood method (ML method ) 
developed here is based on iGarcia-Dabdl (j2002l ). 
who used this technique to determine unbiased 
luminosity functions. ML methods have been 
used in a w i de ra nge of topics in astrophysics. 
lArzner et al.l ( 2007 ) use it to i mprov e the d etermi- 
nation of faint X-ray spectra; IShethl (|2007h to ob- 
tain redshift and l uminosity distributions in pho- 
tometric surveys; Navlor &: Jeffries! (120061) to fit 
colour-magnitude diagrams; M akarov et al. I (|2006l) 
to improve distanc e estimates using Red Giant 
Branch stars; and, lEfstathioul ( 20041 ) to analyze 
low cosmic microwave background multipoles from 
the Wilkinson Microwave Anisotropy Probe. ML 
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methods are based on the estimation of the most 
probable values of a set of parameters which define 
the probability dist ribution that describes an ob- 
servational sample ( Davidson fc Mackinnon .1993; 
Penall200lh . 

The general ML method operates as follows. 
Throughout the paper we denote as P(a|b) the 
probability to obtain the values a, given parame- 
ters b. Being Xj a vector containing all the mea- 
sured values for source i in the data set and 9 
the parameters of the underlying multidimensional 
distribution that we want to estimate, we may ex- 
press the joined likelihood function as 



a A ab . i, 



L(x 4 |0) = -ln[[]P(x^) 



E 



hx[P(Ki\6)] t 



(3) 

where P(xi\9) is the probability to obtain for a 
given 8. If we are able to express P{x.i\9) analyt- 
ically, we can minimize Equation [3] to obtain the 
best estimation of parameters 9, denote as 9ml- In 
our case, Xi are the observed values of z and A for 
source i, x, = (z ohSii , A ohSii ), while 6 = (p k i,a), 
where p k i are the probabilities which we defined 
in the paragraph previous to Equation [2J and a 
denotes any other fixed parameters of the distri- 
bution. 

Sources are assumed to have real redshift and 
asymmetry values z rca i,i and A rca i^ (not affected 
by observational errors) which define a bidimen- 
sional distribution p k i such that 

P2D(z T ea\,i, Aeal,zbfc;) 

= {pki,yz k < z roaM < z k+ll Ai < A IcaU < A l+1 }. 

(4) 

Observational errors cause the observed z bs,i 
and A bs,i to differ from their respective real values 
•Zreai.i and j4 rea i,i- The observed z b s ,i are assumed 
to be extracted for a Gaussian distribution with 
mean z re ai,i and standard deviation cr Zobs , , 



e 



2tt(t z 



(5) 



Similarly, the observed asymmetry values A b s ,i 
are assumed to be extracted from a Gaussian dis- 
tribution with mean A rca \ i and standard deviation 



^(^obs.il^real,^ a A obS:i ) 

While the z p hot errors may not be strictly Gaus- 
sian, this is the best analytical approximation of 
the errors that we can make. We obtain the prob- 
ability P(xi\9) of each source by the total proba- 
bility theorem: 

P(z obSti , A obSi i\p k i, a Zobs i,cTA oha i) 

xPG(^4obs,i|Aeal,i, CA obM ) 

xi^D^rcal. i, Aeal.i \pkl )dz re al,zdA rea i ii , (7) 

where Xj = (z obs ,i, A ohs ^) and 9 = (p k i, cr Zobai , 
(JA obB i ) in Equation O with a = (cr 2oba i , @A obB < ) ■ 
Note that the values of <J Zobs 4 and <JA obs 4 are the 
measured uncertainties for each source, so the only 
unknowns are the probabilities p k i , which we want 
to estimate. Note also that we integrate over the 
variables z roa i,i and A vca i^, so we are not be able 
to estimate them individually, but only the under- 
lying bidimensional distribution p k i that describes 
the sample. 

In order to ensure that the probabilities p k i are 
not negative, we change variables, p k i = exp(p' fci ); 
this change keeps our problem analytic. With 
these new variables and after integrating Equation 
[71 our likelihood function, defined in Equation [3l 
becomes 

-^(ZobSji; A obSj i\p kl , C Zobs i , &A obSti ) 



E ln^^^ERF(z,i,A;)ERF(A,i,0 



k I 



(8) 



where 



ERF(?7,i,fc) 
~ Cr V V2^ obBil 



erf 



^?obs,z 



^ a Va b 



(9) 



and erf(x) is the error function. We must observe 
that in the minimization of Equation [5] the vari- 
ables p' kl are not independent. This is due to the 
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normalization of the distribition: the integration 
over all parameters space muts be one. This im- 
pose the following condition over p' kl : 



EE- 

k I 



Pki 



(z k+1 -z k )(Ai +1 -Ai)-l = 0. 

(10) 

The method for finding the extrema of a func- 
tion of several variables subject to one or more 
constrai nts is know as the Lagra nge multipliers 
(see e.g.. lMarsden fc Tromballl996l for details). It 



states that the function to minimize is not the tar- 
get function, Equation [51 but a related one: 



G(p' k i, A ) = L(z ohs , t , A obs ,i\p' k 



kh u Zohs 



-Ag(p«). (11) 



where A is an auxiliary variable. Minimizing Equa- 
tion QT] we obtain the best p' kl values, denoted as 

PfeZ.ML- 

The minimization of Equation [TT] can be per- 
formed with any numerical minimization code. We 
used AMOEBA, which is based on the commonly used 
algorithm of Nel der-Mead (Nelder fc Meadl Il965h 
and coded in C ((Presjll995l pp 408-412). 

At this point we have the probabilities p' kl ML . 
However, our goal is to obtain not only the best 
probabilities estimation, but also their associated 
uncertainties. The ML method states that we can 
obtain all the parameter covariaces using an ex- 
pansion of the function G(p' kl , A) in Taylor's series 
of its variables 9 = (p' kl , A) around the minimiza- 
tion point 9ml = (p'ki mL'^a/l) if the probability 
distributions of p' kl ML are Gaussian, which we as- 
sume. The previous minimization process made 
the first G derivate null at 9 = 9ml and we obtain 



9ml) t H( 



I ML 



(12) 



where H — h xy is the Hessian matrix and 
T denotes the transpose vector. The inverse 
of the Hessian matrix gives us an estimate of 
the 68% confidence intervals of p' fc ; ML , denoted 
as \p' k 



H,ML °p' fci ML >PfcZ,ML 



aw 1, and the 



covariances between each p' kl ML , denoted as 
C0V (Pmn,ML)Kt,ML). because maximum likeli- 
hood theory states that cov(9 x ,9 y ) > h~y and 



> h xx . In our case, the Hessian matrix is 



H = 



' mn dp' s 



(13) 



ERF(z, i, m)ERF(^4, i, n) 



where 

d 2 G 

dp'mndPst 16 

ERF(z, i, s)ERF(A, i, t)e p '™eP 
Ei Ek ^ERFCz, i, fc)ERF(A i, 1 1 

d 2 G 



(14) 



V<7 



dXdp' 



(z. 



m+l 



z m )(A n+ i - A n )e p " 



(15) 



Finally, the PkiML probabilities simply are: 



Pfci.ML 



pPfc[,ML 



(16) 



Assuming that the p' kl ML follow a Gaussian dis- 
tribution, which is assured by the ML theory for 
large number of sources, the Pm,ml follow a log- 
normal distribution: 



Pln(pu \Pkl,ML> a P'ki,ui) 

-(JnpM-Pki ,ml) ! / 2 »j , 



2npki ■ a 



(17) 



i>ki 



which is highly asymmetric and whose 68% confi- 
dence interval is \a~ , 1 , where 



PfcZ,ML, 
e Vfc! - ML Pfci,ML- 



(18) 
(19) 



Furthermore, each p k0 and p kl are connected by 
the covariance cov(p' k0 ML ,p' kl ML ), so the confi- 
dence intervals of p^o andp/ci are not independent. 
In the next section we explain how to obtain the 
confidence interval of the galaxy merger fraction 
taking this into account. 

2.2. The galaxy merger fraction 

Expressing the galaxy merger fraction in the 
range [zk, Zfe+l) (Equation [T|) as a function of the 
output variables of the ML method we obtain: 



ML 

gm,/c 



%Pki, 



ML 



PfcO.ML + 2p fe 



l.ML 



(20) 



However, we cannot obtain the 68% conficence 
interval of /^ L fc , defined as [er~ , cr+ ML ], apply- 

gm , fc J gm , fc 

ing the usual error theory, which is based in Gaus- 
sianity of variables, because the probability distri- 
bution of each Pki t ML is log-normal. Furthermore, 
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the problem is not analytic and we cannot obtain a 
mathematical description of the / gm ,fe probability 
distributions. We made Monte Carlo simulations 
to characterize the probability distribution of each 
/gm.fc- The simulations showed that the / gm ,fc dis- 
tributions can be fit with a log-normal: 



Pl N ( /gin , k I /gni^fc > Cr ) 



-(ln/ gm , fc -ln/ g » L J 2 /2a 2 



(21) 

where cr is the only free parameter on the fit. Fi- 
nally, the 68% confidence interval of f^ k is given 

by 

(22) 
(23) 



(J — O J f 



a /-ML 



gm,fc ) 



•* gm,k 



e CT f 



3. SIMULATIONS WITH SYNTHETIC 
CATALOGS 

The accuracy and reliability of the ML method 
can be tested using synthetic catalogs. This is 
an important step since ML theory warns that 
the estimated parameters may suffer from biases; 
convergence is only assured for large number of 
sources. The approach is to create catalogs with 
predefined underlying distribution parameters and 
compare with the estimated ML parameters. Note 
that the inputs of the ML method are the same 
whether we have a real catalog or a synthetic 
one. In the following paragraphs, we first explain 
how we created the synthetic catalogs in a gen- 
eral case, and later define and justify the input 
parameters used for the synthetic catalogs in this 
paper. Given the high number of variables used in 
the following discussion, we provide their precise 
definitions in Table [2j 

We created the synthetic catalogs as follows: 
first we took n sources distributed in redshift and 
asymmetry space following a bidimensional distri- 
bution defined by the input probabilities p k i,iii- 
This process yielded the z- ln ^ and A{ n ^ values of 
the n sources of our synthetic catalogs, which 
play the role of z roa i,i and A rca \i in Equation [U 
Next, we applied the experimental errors: follow- 
ing Equation [5] we obtained the simulated z s i m ^ 
values, which play the role of z bs,i, as drawn from 
a Gaussian distribution with mean z- m ,i and stan- 
dard deviation cr Zaim i ; the latter plays the role of 
cr Zobs i . The value of cr Zsim i is a positive value 



and 
0.7 



obtained also from a Gaussian distribution with 
mean ~o~z~ and standard deviation o az . The process 
was repeated following Equation [5] to obtain the 
simulated A slmi i and its standard deviation <JA aim 4 ■ 
Finally, we applied the ML method over the syn- 
thetic catalog to obtain p' kl ML and ay . Sum- 
marizing, the input parameters of our simulations 
were the bidimensional distribution pu,im n, ctJ, 
o tT% , oa, and o aA , and the output parameters were 

Ph,ml and °y„,Mi/ 

We defined three intervals in redshift 
0,1,2) with Az = 0.4 and z € [0,1.2 
two in asymmetry (I = 0, 1) with AA 
and A e [—0.35,1.05). Distorted sources with 
A > A m = 0.35 (see C03 for details about the 
determination of this limit value) are described by 
Pki in' wnne normal sources by p' k0 ln - We list in 
Table [T] the redshift and asymmetry intervals, as 
well as the probabilities Pki.in and p' kl in = lnp^in, 
that define the input bidimensional distribution of 
our synthetic catalogs. The p' kl in values in Ta- 
ble [1] do not match any particular observational 
determination of these quantities, but they fol- 
low the general behavior inferred from observed 
galaxy merger fractions: highly asymmetric galax- 
ies are less fre quent than low-asymmetry galaxies 
up to z = 1.2 dConselice et alj|2003t Cassata et al 



20051: iBridee et all 120071: iKampczvk et al.ll2007l) . 



so the 



Pkl,m 



are lower than the 



PfeO.i 



The 



ber of highly asymmetric galaxie s increases with 



redsh ift in the range z £ [0, 1.2) (jConselice et al 
20031 ). so p' kl in increase with redshift. Several 



studies present a maximum at intermediate z in 
the redshift distributio n of galaxies in opti cally 



Grazian et all l2006h . 



so 



selected samples (e.g. 
Pko in + Pki in values have a maximum in the in- 
terval z — [0.4, 0.8). We can check that the p' kl - m 
are normalized following Equation 1101 Although 
we preset here this particular bidimensional distri- 
bution we carried out the same study with other 
distributions, and the results were similar. 

For convenience we express the experimental 
dispersions using the dimensionless variables 



Cbi) 



Cbin,A 



(7 z 

A? 
AA' 



(24) 
(25) 



We used the same value of both variables in each 
simulation, that is, we used Cbin = °bin z = ^bin A- 
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Because we fixed the values of Az = 0.4 and 
A A = 0.7, fibin unequivocally defines a7 and ctJ. 
It is important to notice that, when we work with 
observational data, the situation is the opposite: 
our data define crj and Wa, and we should choose 
the most appropriate values of Az and AA. We 
made simulations for (Tbin = as a check corre- 
sponding to null experimental errors, Chin = 0.25 
and 0.5 as typical observational cases, and (Tbin 
= 1.0 as extreme case to explore the applicability 
limits of the ML method. The values of a a ^ and 
a aA were a half of cfj and Wa respectively in all 
cases. 

We ran models with n = 50, 100, and 1000 
to check catalog size effects. We took these val- 
ues because we expect experimental catalogs of a 
few hundred sources or more and we are interested 
in the applicability limits of the method to small 
samples. 

In order to study how the ML parameters com- 
pare with the input parameters, we must preform 
several simulations and study how the parameters 
Pki ml are distributed. Hence, for each n and <7bi n 
case we create a simulation set of N — 1000 inde- 
pendent synthetic catalogs. 

The results of the simulations are shown in Fig- 
ure [TJ and in Tables [3j EJ and [5l Figure Q] shows 
Pki ml f° r samples of n — 1000 sources (crosses), 
with error bars showing their 68% confidence in- 
tervals; for comparison, the input probabilities 
Pki in are sri0wn as black circles, and the p' kl class , 
obtained by drawing a classical histogram (as de- 
fined below in Section 13. lj) , are shown as gray tri- 
angles, also for n — 1000 catalogs. In Figure [IJ 
panels a, b, and c correspond to increasing values 
of the experimental errors, defined in Equations 
[24l [25l and shown in the legend; panels a, b, c may 
be taken to respectively describe 'good', 'typical', 
and 'bad' observational errors as compared to the 
z and A bin sizes. The top/bottom panels show 
p' kl for the low/high- asymmetry bins. Within each 
panel, values for the three redshift bins are shown, 
as labeled on the horizontal axes. We provide the 
results in tabular format in Tables [3j EJ and [5j 
corresponding to simulations with sample sizes of 
n = 50, 100, and 1000, respectively. 



3.1. Classical bidimensional distribution 

Before presenting the results of the ML method, 
we analyze the estimation of the p kl parameters 
using the classical bidimensional historgram of the 
z s im,i and A s i mj j data. We translate the histogram 
occupation numbers njy to probabilities p' kl class 
using 

^' class = ln (AzAAsU ; n fci )' (26) 

where n k i is the number of sources whit z s im,ij 
Asim.i whitin the [z k ,z k+1 ) U [A h Ai +1 ) bin. We 
want to study how the classical method compares 
with the input parameters. The distribution of 
the N values of p' kl class in one simulation set can 
be represented by its median p' kl class and stan- 
dard deviation ^ . In Tables [3] -[5] we can see 
that the classical bidimensional distribution recov- 
ers the input probabilities in the case of null exper- 
imental errors and n large as expected. However, 
the shape of the input bidimensional distribution 
begins to deviate when a^ m increases, as we can 
also see in Figure [TJ the classical bidimensional 
distribution (gray triangles) is smoothed by ex- 
perimental errors and does not estimate well the 
underlying bidimensional distribution (black cir- 
cles). We study this in detail in Section l3~3l 

3.2. The ML method in absence of exper- 
imental errors 

We first test that the ML method, in the case of 
null experimental errors, recovers the input bidi- 
mensional distribution, i.e., that it reduces to the 
classic method. We can see in Tables [3] -[5] that the 
values of p' kl class and the median of the N values 

recovered by the ML method, denoted as p' kl ML , 
are the same in all cases. This also happens with 
the values of s„/ and the standard deviations 

rkl, class 

of p' kl ML , denoted as s p < . This indicates that 
the ML method does not introduce systematic ef- 
fects on the results. 

3.3. The ML method with non-null exper- 
imental errors 

We now examine how well the ML and classical 
methods recover the input probabilities p' kl in when 
non-null experimental errors are included in the 
synthetic catalogs. We use the N — 1000 source 
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Fig. 1. — Results of run the ML method over N = 1000 synthetic catalogs with n = 1000 sources each for 
different experimental errors: (a) a\,\ n — 0.25, (b) a\,\ n — 0.5, and (c) Obin = 1- In all figures black circles 
are the input bidimensional probabilities p' kl in , gray triangles arc the classical bidimensional probabilities 

Pki class an d crosses are the ML bidimensional probabilities p' kl ML . The error bars are the 68% confidence 

intervals given by ML method, \p' klMh - <V HiML .pJh.ml + <Vm,mJ- 



catalogs as an example, which is representative of 
the general trends. The results are shown in Fig- 
ure 1, and are tabulated in Table 5. It is clear 
from Figure 1 that p' kl ML (crosses) , recover the in- 
put probabilities p' kl in (black circles) in all cases, 
including those in which the inserted errors are as 
large as the bin size (panels c). From Table 5 we 
see that the values of p' kl ■ always lay within the 
68% confidence interval of the ML method, de- 

This 



fined by [p' kl 



J klML ''P'fcrML'-Pfc'.ML "T °P^ MlJ 

shows that the ML method is reliable. In con- 
trast, the probabilities p' kl class derived from the 
classical histogram (gray triangles in Figure 1) 
systematically deviate from the input probabili- 
ties. Probabilities are systematically underesti- 
mated/overestimated in the low/high-asymmetry 
bins (upper/lower panels), due to a spill-over from 
the most populated bins (low asymmetries) to the 
least populated, high-asymmetry bins. Such de- 
viations increase for larger experimental errors. 
When the errors are as large as the bin size, spill- 
over is so pronounced that the probabilities in 
the high-asymmetry sample (lower right panel) are 



nearly equal for the three redshift bins, and all in- 
formation on the redshift variation of the galaxy 
merger fractions is lost. 

We conclude that the ML method is an unbi- 
ased estimator of the input distribution. To put 
this statement in a more quantitative b asis, we 
carry out a Student's t-test (|Collii 
We define our estimator as 



T, 



H,ML 



N Pkl.in ~ P'kl. 



ML 



(27) 



and accept that p' kl in = p' kl ML with a 99% of con- 
fidence when T k i y ML < 2.6. We define in the same 
way the variable T k \ t c i ass to study the accuracy of 
tne Pki class as an estimator of the p' kl in . We calcu- 
late the median of the Tu,ml an d 7fe; jC i ass for each 
simulation set, denoted as Tml and T c i ass respec- 
tively, to make a comparison beetwen different n 
and Cbin- 

The results are summarized in Tables [3] - EJ 
and in Figure El We can see that Tml is below 
the confidence level for all n and abm' the p' kl ML 
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Fig. 2. — Variation of Tml (black symbols) and 
Tciass (gray symbols) with dimensionless experi- 
mental error size <7bi n - Triangles are for n = 50, 
circles for n = 100, and squares for n = 1000 
source catalogs. The solid line is the 99% confi- 
dence limit T = 2.6. 

are good estimators of the p' k i in , as wanted. In 
contrast, the classical method is far from the con- 
fidence condition even in the Obin = 0.25 case, 
and T c i ass increases with Obm- Besides, having a 
large n does not improve the results of classical 
method: the p' kl class values are similar for every 
n, but the errors are reduced when increasing n, 
making Tciass higher. That is, having a large ob- 
servational sample affected by experimental errors 
does not improve the estimation of p' kl in , and the 
Pki class err ors are underestimated. This bias af- 
fects the galaxy merger fractions obtained from 
Pki class > as we can see on Section 13.71 

3.4. Study of oy 

When we apply the ML method to an ob- 
servational sample we obtain an estimation of 
the p' kl ML 68% confidence intervals, [p' kl ML — 



0V , p' k i mt + °V 1 , an d we want to know if 
these confidence intervals are representative of the 
p' kl probability distributions. They are represen- 
tative if the median of the N values of a n > 
denoted similar to sw . To 

study this issu e we perform a Fisher's variance test 
(|Collinslll990l . p. 234). We define our estimator as 



Fig. 3. — Variation of F with dimensionless exper- 
imental error size <7bin- Triangles are for n = 50, 
circles for n = 100, and squares for n — 1000 
source catalogs. The solid line is the 99% confi- 
dence limit F = 1.8. 



and accept that sw — a„> with a 99% of 
confidence when Fki < 1.18. We calculate the 
median of the F^i for each simulation set, de- 
noted as F, to make a comparison beetwen dif- 
ferent n and o"bi n - The results are summarized in 
Tables [3] - [SJ and in Figure [3] We can see that 
sw — av for all n when Obm — 0.25, 0.5. 
Only when Obm = 1-0 and the samples are small 
(n = 50, 100) does F lie above the confidence lim- 
its. 

These results imply that the ML method sup- 
plies reliable confidence intervals of p' kl ML with 
thousand sources samples or, with less sources, if 
the experimental errors are at most a half of the 
histogram bin size. 



The differences between sw 



and <7 B 



have two origins. The main effect comes from the 
fact that the probability distributions of p' kl ML 
are not perfectly Gaussian, and we had assumed 
Gaussianity to obtain a p > ml analytically. We 
study this issue in the next section. The other 
effect is that we evaluated the theoretical values 



of oy 



at ?4;ml : minimization method 



AMOEBA is not perfect and we may have estimated 
a local minimum of Equation [11] instead the abso- 
lute minimum (see Section [ 



Fi 



kl 



maxfoy , sw 



minfay , sw ) 2 



(28) 



3.5. Probability distributions of p' kl 

In the analytical estimation of the p' kl ML co- 
variances we assumed that the p' kl ML probabil- 



S 



ity distributions are Gaussian. To check this as- 
sumption we made a histogram of the N values 
of p' kl ML to obtain the shape of the p' kl ML proba- 
bility distribution, which we want to approximate 
by a Gaussian with mean p' kl ML and standard de- 
viation s„? . We tested this Gaussian approxi- 



We also find that s r 



increases with <7bin, but 



mation with a Kolmogorov-Smirnov test ( Collins! 
19901 p. 235). 

We saw that the Gaussian distribution approx- 
imation was valid for all (7bi n in the n — 1000 
simulation sets. The situation of the n = 50 and 
100 simulation sets was more complicated. For 
n = 100 the p' k0 ML Gaussian approximation was 
valid for all <7bi n , while the p' kl ML started to be 
non Gaussian for <7bm = 0.5, and we could not 
assume Gaussianity for cr bin = 1.0. For n — 50 
simulations we could not assume Gaussian approx- 
imation from (Tbin = 0.25 to the p' kl ML and from 
cr b in = 0.5 to the p' kQML . 

These results emphasize that one must check 
the Gaussian approximation of the p' kl ML proba- 
bility distributions in each case. That is, when ap- 
plying the ML method to an experimental catalog 
it is essential to make special simulations aimed at 
verifying the Gaussianity of the recovered proba- 
bilities. 

3.6. The standard deviation of the ML 
method due to iterative minimization 

The iterative minimization method AMOEBA 
used to obtain the minimum of Equation [Tl] can 
introduce an error in the determination of p' kl ML 
if the method converges to a local minimum. Be- 
sides, increasing the experimental errors relaxes 
the conditions over the absolute minimum and 
makes it more probable that the method con- 
verges onto one such local minimum. To study 
this effect and its importance, we apply the ML 
method N = 100 times over the same catalog, one 
per simulation set. We define the variable s„> 

1 fkl, iter 

as the dispersion of the N values of the recovered 
probabilities p' kl ML . We find that the values of 
s p ' ki . depend on the tolerance and the maximum 
number of iterations of the minimization method. 
We take a 10 -15 tolerance and 5000 iterations as 
optimal values: less tolerance or more iterations 
does not reduce s p ' ki . , but increased the compu- 
tational time. All final simulations presented in 
this paper were made with these optimal values. 



is ~ 5 times smaller than s„i in the worst ex- 
perimental error case, so the standard deviations 
of the probabilities are slightly affected by this 
effect. Therefore, when applying the ML method 
to an experimental catalog, it is safe practice to 
apply it more than once, as a precaution against 
local solutions and iteration bias. 

3.7. The galaxy merger fraction 

In the previous sections we have seen that the 
experimental errors modify the input bidimcnsinal 
distribution, biasing the classical method estima- 
tions, whereas the ML method is able to recover 
the input bidimensional distribution. In this sec- 
tion we study the general effect and trends that 
the experimental errors introduce on the galaxy 
merger fraction determination. To obtain the 
galaxy merger fraction by the ML method we fol- 
low Section 12.21 First we determine the galaxy 
merger fraction f^ k applying Equation [16] to 
the p' kl ML probabilities in Tables [3] - [5j Next, 
we perform Monte Carlo simulations with this 

f^t,k values and the Pfci.ML and Op^ml in Tables 
[3] - [5] to characterize the probability distribution 
of / gm ,fc, obtaining the 68% confidence interval 



j , 2 



, aJ M L } with Equations [22] and [ 



The galaxy merger fraction by the classical 
method is, applying Equation [2] 



f 



class 
gm,fc 



2e p * 



(29) 



while its 68% confidence interval [/ 



class 



f 



class 



(T fd 



is obtained applying the usual error 



theory to Equation 1291 



Ifch 



2gPfcO, class gPfel, class 
( e P'kO, class _|_ 2e P ". daa =) 2 



s 2 , 



(30) 



Because of the experimental error limits of the 
ML method which we noticed in the previous sec- 
tions, we only made this study with the n — 1000 
simulation sets. We summarize the results in Ta- 
ble [6] and Figured] We can see that the classical 
method gives worst estimates of the input galaxy 
merger fraction when the experimental errors in- 
crease. We may take as observational reference the 
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Fig. 4. — Galaxy merger fraction estimations by 
classical (gray symbols) and ML method (black 
symbols). In the two cases triangles are for 
Cbm = 0.25, circles for Obm = 0.5, and squares 
for <7bm = 1- The black solid lines are the input 
galaxy merger fraction in each rcdshift bin. We 
can take <7bin = 0.25 as observational reference. 



Ob 



0.25 case (for example, in IConselice et al 
20031 we have (7bin ~ 0.2). In this case, the differ- 



ence between the input and the classical estima- 
tion is ~ 0.1 on the first and second redshift in- 
tervals, which have the lower input galaxy merger 
fraction, and ~ 0.05 in the third interval. Further- 
more, the experimental errors tend to smooth the 
galaxy merger fraction values. An extreme case is 
ot>in = 1, where the dependency in z has been lost. 
In addition, the confidence intervals are underesti- 
mated and are ~ 0.035 in every case. In contrast, 
the differences between the input and ML method 
galaxy merger fractions are ~ 0.01 in every red- 
shift bin and experimental error case. Further- 
more, the 68% confidence intervals are more real- 
istic: in the Obm = 0.25, 0.5 cases they are ~ 0.05, 
while in the <7bin = 1.0 case they increase to ~ 0.1. 

Finally, we also determined the classical galaxy 
merger fraction in the n = 50 and 100 cases, and 
noticed that the values of f^k were similar in 
each (7bin case: having large samples does not im- 
prove the results and we must take into account 
the experimental errors in our analysis to avoid 
the bias. 



4. DETERMINATION OF ANY ONE- 
OR BIDIMENSINAL DISTRIBUTION 
BY THE ML METHOD 

The method outlined here may easily be ap- 
plied to the unbiased determination of any bidi- 
mensional distribution in the presence of observa- 
tional errors. For exa mple, the automat ic indices 
M20 and G are used in lLotz et al. I (|20Q8h to deter- 
mine the galaxy merger fraction by morphological 
criteria. We could apply the ML method by defin- 
ing the variable MG = G + 0.14M 20 - 0.33 and by 
calling merger systems all sources with MG > 0. 
Similarly, we may apply the ML method to obtain 
density of sources in color-color diagrams, espe- 
cially when we have some condition that separates 
populations, or to determine one-dimensional his- 
togram of any observational magnitude. 

For reference, we provide an outline for the ap- 
plication of the ML method to any one- or bidi- 
mensional experimental distribution subject to ob- 
servational errors: 

1. Define the observational catalog. This cata- 
log cannot be restricted to the interval of in- 
terest, e.g., [zq, Zk], because there are sources 
both with Zi < zq and Zi > Zk that could 
belong to a real bidimensional distribution 
bin within the range of interest due to the 
observational errors. In general one should 
include in the sample those sources with 
z-i + 2<7j > zq and zi — 2(7* < zu to avoid 
incompleteness effects. 

2. Apply the ML method to the observational 
catalog. First, define the bidimensional dis- 
tribution bins taking into account the size 
of the observational errors. Next, minimize 
Equation [11] to obtain the most probable 
values of p' kl , p' kl ML . To determine their 
confidence intervals, calculate the Hessian 
matrix, Equation 1 13[ with the observational 
data and the previous p' kl ML values. The di- 
agonal elements of the inverse Hessian ma- 
trix provide a„> . Notice that we as- 
sumed Gaussian experimental errors, Equa- 
tions [5] and [6] in the development of the ML 
method. If you need to assume other exper- 
imental error distributions, you need to re- 
calculate Equations QTJ [14] and [15] with the 
new error distributions. 
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3. Check the results with representative syn- 
thetic catalogs. Run simulations with syn- 
thetic catalogs to test the accuracy and 
Gaussianity limits of the method in each 
particular case following the methodology of 
sections 13.31 13.41 and 13.51 These synthetic 
catalogs should have the previous p' kl 

ML as 

bidimensional distribution input, that is, as 
p' kl in , and similar characteristics to the ex- 
perimental ones to fix the other input pa- 
rameters. For example, synthetic and exper- 
imental catalogs should have same number 
of sources n, and efj may be given by the 
median of the photometric redshift errors in 
each redshift bin, while, for a az , one may use 
the dispersions of these photometric redshift 
errors. Besides, is important to take into 
account special cases, e.g., the number of 
sources with z spoc , which have a z ~ 0, in 
each bin, or avoid unphysical values, e.g., 
negative redshifts. 

4. Determine p k i, Equation ll61 and their confi- 
dence intervals, Equations [TBI and [T9l in the 
reliable cases. 

5. CONCLUSIONS 

We have presented a maximum likelihood 
method to recover bidimensional distributions of 
experimental data subject to measurement errors, 
and applied it to the determination of the galaxy 
merger fraction based on asymmetry criteria from 
C03. 

The Gaussianity of p' kl ML is the strongest con- 
dition on the reliability of the method. From the 
results, taking into account that typical observa- 
tional catalogs usually have a few hundred sources, 
and that the probabilities p' kl would be small, we 
conclude that the bin of the bidimensional distri- 
bution must be at least twice the typical error in 
redshift in the observational catalog. Within this 
quality limit, the ML method can recover with ac- 
curacy and reliability the lost information due to 
the experimental errors. Besides, our results have 
realistic errors with known shapes, which the clas- 
sical histograms cannot provide. 

The ML method presented here may in prin- 
ciple be extended to as many dimensions as re- 
quired by the astrophysical problem we are ad- 
dressing. For instance, if we wish to determine 



variations in the galaxy merger fraction as a func- 
tion of galaxy mass, errors in the galaxy mass de- 
termination would make objects spill over from 
one mass bin to the next, biasing the classical his- 
togram approach. The ML method with an added 
mass axis would solve the problem. Even if we 
are not seeking to determine the variation of the 
galaxy merger fraction with mass, our parent sam- 
ple unavoidably has a boundary (e.g., luminosity; 
mass; color), and observational errors make ob- 
jects jump in and out of the sample, hence poten- 
tially modifying the shape of the distribution we 
are trying to determine. This extension to higher 
dimensions is straightforward only when the third 
variable is independent from the other two. In the 
case of a third luminosity or mass axis, this is un- 
fortunately not the case: luminosity and mass de- 
pend on galaxy redshift, introducing covariances 
between the variables. Furthermore, luminosity 
and mass are affected by incompleteness functions, 
making our problem non- analytic. We leave the 
treatment of this problem for future work. 

We dedicate this paper to the memory of our 
six IAC colleagues and friends who met with a fa- 
tal accident in Piedra de los Cochinos, Tenerife, 
in February 2007, with a special thanks to Maur- 
izio Panniello, whose teachings of python were so 
important for this paper. 

This work was supported by the Spanish Pro- 
grama Nacional de Astronorm'a y Astroffsica 
through project number AYA2006-12955. 
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Table 1 

Input bidimensional distribution used for the synthetic catalogs 



k 


I 


Pkl,in 


Pkl, in 


[Zfc,2fc+l) 


[A h A l+1 ) 








0.71428 


-0.33647 


[0, 0.4) 


[-0.35, 0.35) 


1 





1.07143 


0.06899 


[0.4, 0.8) 


[-0.35, 0.35) 


2 





0.89286 


-0.11333 


[0.8, 1.2) 


[-0.35, 0.35) 





1 


0.17857 


-1.72277 


[0, 0.4) 


[0.35, 1.05) 


1 


1 


0.28571 


-1.25276 


[0.4, 0.8) 


[0.35, 1.05) 


2 


1 


0.42857 


-0.84730 


[0.8, 1.2) 


[0.35, 1.05) 



Note. — Variable definitions: 
k: index that scans the redshift bins. 
I: index that scans the asymmetry bins. 
Pfe!.in ; probability that a source has redshift in bin k and 
asymmetry in bin I. 
Pfe!,in : logarithm ofp fcijin . 
[zfc,Zfc+i): redshift bin k. 
[Ai,A[ + i): asymmetry bin I. 
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Table 2 

Variable definitions for the simulations 



Variable Definition 



Input Variables 



n Number of total sources in a synthetic catalog. 

n^i Number of sources in [zfc, U [A;, bin. 

Az Rcdshift bin size. 

AA Asymmetry bin size. 

N Number of synthetic catalogs in each simulation set. 

p' k[ in Logarithmic probabilities of the input bidimensional distribution of the synthetic catalogs 

<?7 Median experimental errors in redshift of the synthetic catalog sources. 

(y rj z Dispersion on cr z of the synthetic catalog sources. 

Wa Median experimental errors in asymmetry of the synthetic catalog sources. 

<Jcr A Dispersion on a \ of the synthetic catalog sources. 



<T bin = Dimcnsionless experimental error size. 



Output Variables 



Pkl class Classical logarithmic probabilities of the classical bidimensional distribution. 

P'kl class Median of the N values of p' kl class in one simulation set. 

s p i Standard deviation of the N values of p' kl 

class m onc simulation set. 
p' k[ ML Logarithmic probabilities of the bidimensional distribution recovered by the ML method. 

V fc! , ML The 68% confidence interval of p' kl ML given by the ML method, [p' fe; ML - V fei , ML ' p 'fci,ML " 

<T„/ 1. 

P'kl ml Median of the N values of p' kl ML in onc simulation set. 

s r ,i Standard deviation of the N values of p', , . rT in onc simulation set. 

Tfli Median of the N values of cr„/ in one simulation set. 

P H,ML p h,ml 

Quality Variables 



Tfcj,ML ' ■ Accepted that p' M in = p' kl ML when T fc(iML < 2.6. 

p fci,ML 
max(cF~/ ,s / ) 2 

F *l ^f^T - Accc P ted that V W , ML = ^[ --en F kl < 1.18. 

s i Standard deviation of the ML method due to iterative minimization process. 
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Results of ML method over N 



Table 3 

1000 synthetic catalogs with n 



50 sources 



pm,, 



Pil.ML Tkl,. 



P H,ML 



p fcI,ML 



Fkl 



Pfcl.cla 



P kL , class 



7fc(,cla 



PpO 
PlO 
P20 

Pm 

Pll 
p'21 



-0.33647 
0.06899 
-0.11333 
-1.72277 
-1.25276 
-0.84730 



-0.31627 
0.08920 
-0.13395 
-1.92571 
-1.23256 
-0.82710 



0.30034 
0.19871 
0.30034 
0.81379 
0.37839 
0.51344 



0.28284 
0.22361 
0.25166 
0.69282 
0.47958 
0.38297 



-0.31627 
0.08920 
-0.13395 
-1.92571 
-1.23256 
-0.82710 



0.30034 
0.19871 
0.30034 
0.81379 
0.37839 
0.51344 







a z =0.1 cr CTs 


= 0.05 


a A = 


= 0.175 a, A 


= 0.0875 




= 0.25 




Poo 


-0.33647 


-0.33337 


0.29 


0.33981 


0.34855 


1.052 


-0.39768 


0.34393 


5.63 


Pio 


0.06899 


0.08583 


1.82 


0.29197 


0.29191 


1.001 


0.00779 


0.24453 


7.91 


P20 


-0.11333 


-0.11093 


0.24 


0.31124 


0.30611 


1.034 


-0.20757 


0.30034 


9.92 


Poi 


-1.72277 


-1.78200 


1.82 


1.02762 


0.87331 


1.385 


-1.40150 


0.53047 


19.15 


p'll 


-1.25276 


-1.32243 


2.39 


0.92324 


0.74240 


1.546 


-0.93512 


0.45987 


21.84 


p'21 


-0.84730 


-0.83122 


1.01 


0.50531 


0.48405 


1.090 


-0.82005 


0.39750 


2.17 






a7=0.2 a 


, z =0.1 


"OA 


= 0.35 a a A 


= 0.175 


"bin ; 


= 0.5 





Poo 


-0.33647 


-0.34488 


0.59 


0.45113 


0.47976 


1.131 


-0.49339 


0.36822 


13.48 


Pio 


0.06899 


0.08788 


1.37 


0.43485 


0.42913 


1.027 


-0.07188 


0.29763 


14.97 


P20 


-0.11333 


-0.07862 


2.66 


0.41200 


0.39237 


1.102 


-0.29420 


0.36126 


15.83 


Pm 


-1.72277 


-1.88189 


1.05 


4.79089 


1.45853 


10.789 


-1.17049 


0.55578 


31.42 


P'll 


-1.25276 


-1.27478 


0.31 


2.25784 


1.21780 


3.437 


-0.72192 


0.44502 


37.72 


P'21 


-0.84730 


-0.87929 


1.34 


0.75458 


0.70106 


1.158 


-0.71523 


0.46986 


8.89 






07= 0.4 


a "z = 


0.2 oa 


= 0.7 a„ A 


= 0.35 


""bin = 


1.0 





Poo 


-0.33647 


-0.31312 


1.21 


0.60964 


0.88500 


2.107 


-0.53435 


0.49716 


12.59 


P } a 
P20 


0.06899 


0.15383 


3.79 


0.70757 


0.85943 


1.475 


-0.19260 


0.39926 


20.72 


-0.11333 


-0.01272 


4.91 


0.64776 


0.69944 


1.166 


-0.34675 


0.44899 


16.44 


Pm 
Pll 


-1.72277 


-2.20397 


1.61 


9.46236 


5.43006 


3.037 


-0.93981 


0.61357 


40.35 


-1.25276 


-1.78130 


2.55 


6.54863 


5.83082 


1.261 


-0.58564 


0.55583 


37.95 


P21 


-0.84730 


-0.89694 


0.70 


2.25333 


1.44775 


2.422 


-0.68095 


0.54790 


9.60 
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Results of ML method over N 



Table 4 

1000 synthetic catalogs with n 



100 SOURCES 



Pki 



Pki, 



P H.ML 



p Jc!,ML 



Pki class S n / Tkl class 

kl, class class 



Ppo 
Pio 
P20 
Pm 
Pn 
P21 



-0.33647 
0.06899 
-0.11333 
-1.72277 
-1.25276 
-0.84730 



-0.33647 
0.06899 
-0.11333 
-1.72277 
-1.25276 
-0.84730 



0.22391 
0.14864 
0.17864 
0.62763 
0.37839 
0.24924 



0.20000 
0.15275 
0.17321 
0.43589 
0.33912 
0.27080 



-0.33647 
0.06899 
-0.11333 
-1.72277 
-1.25276 
-0.84730 



0.22391 
0.14864 
0.17864 
0.62763 
0.37839 
0.24924 





cr 


- = 0.1 cr CTs = 0.05 


a A = 


0.175 <j aA 


= 0.0875 




= 0.25 




Poo 


-0.33647 


-0.35726 


2.77 


0.23715 


0.24543 


1.071 


-0.42902 


0.22794 


12.84 


Pio 


0.06899 


0.08249 


2.02 


0.21156 


0.20096 


1.108 


0.01114 


0.18577 


9.85 


p'20 


-0.11333 


-0.11930 


0.81 


0.23380 


0.21587 


1.173 


-0.20244 


0.20019 


14.08 


P01 


-1.72277 


-1.72171 


0.05 


0.61932 


0.57412 


1.164 


-1.40984 


0.41453 


23.87 


P'n 


-1.25276 


-1.24035 


0.76 


0.51661 


0.49041 


1.110 


-0.92149 


0.28759 


36.42 


p'21 


-0.84730 


-0.84335 


0.37 


0.33559 


0.34188 


1.038 


-0.79798 


0.28500 


5.47 






= 0.2 


tr„ z =0.1 


a~X - 


= 0.35 a„ A 


= 0.175 


"bin = 


= 0.5 




Poo 


-0.33647 


-0.34332 


0.74 


0.28963 


0.31999 


1.221 


-0.47807 


0.26072 


17.17 


Pio 


0.06899 


0.07331 


0.47 


0.28921 


0.29111 


1.013 


-0.07671 


0.20306 


22.69 


p'20 


-0.11333 


-0.11022 


0.35 


0.27894 


0.27998 


1.007 


-0.26198 


0.23834 


19.72 


Pm 


-1.72277 


-1.69378 


1.03 


0.89067 


0.81972 


1.181 


-1.17948 


0.42601 


40.33 


P'n 


-1.25276 


-1.26124 


0.30 


0.88586 


0.77025 


1.323 


-0.72455 


0.31834 


52.47 


P'21 


-0.84730 


-0.85992 


0.88 


0.45226 


0.46377 


1.051 


-0.76835 


0.32399 


7.71 






aT = 0.4 


a az = 0.2 




= 0.7 o„ A 


= 0.35 


""bin = 


1.0 




Poo 


-0.33647 


-0.34227 


0.40 


0.45903 


0.57113 


1.548 


-0.52269 


0.33388 


17.64 


P^o 


0.06899 


0.10014 


2.25 


0.43673 


0.56965 


1.701 


-0.18589 


0.26832 


30.04 


P20 


-0.11333 


-0.08903 


1.63 


0.46981 


0.47632 


1.028 


-0.39434 


0.32866 


27.04 


Pm 


-1.72277 


-1.82510 


0.65 


4.99004 


2.17650 


5.256 


-0.91894 


0.40459 


62.83 


p'll 


-1.25276 


-1.38142 


1.32 


3.07524 


2.24256 


1.880 


-0.58439 


0.34776 


60.78 


p'21 


-0.84730 


-0.84699 


0.01 


0.85518 


0.87014 


1.035 


-0.65849 


0.35630 


16.76 
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Table 5 

Results of ML method over N = 1000 synthetic catalogs with n = 1000 sources 



p'kl 


Pkl.in 


P'kl MI J'fci.ML S I a t 




Pkl class S n / Tkl class 
HI, class Pfci >c lass 






~ = (3 & z = ~5~A = a a A 


= 


Tbin = 



Poo -0.33647 -0.33647 

p' 10 0.06899 0.06899 

p 20 -0.11333 -0.11333 

p 01 -1.72277 -1.72277 

Pi! -1.25276 -1.25276 

P21 -0.84730 -0.84730 



0.06137 0.06325 

0.04940 0.04830 

0.05336 0.05477 

0.14864 0.13784 

0.11132 0.10724 

0.08066 0.08563 



-0.33647 0.06137 

0.06899 0.04940 

-0.11333 0.05336 

-1.72277 0.14864 

-1.25276 0.11132 

-0.84730 0.08066 







= 0.1 cr<7 2 


= 0.05 


a A = 


0.175 <j aA 


= 0.0875 




= 0.25 




Poo 


-0.33647 


-0.35327 


6.99 


0.07596 


0.07700 


1.028 


-0.41246 


0.07170 


33.51 


Pio 


0.06899 


0.07525 


3.19 


0.06204 


0.06414 


1.139 


0.00102 


0.05671 


37.90 


P20 


-0.11333 


-0.11170 


0.73 


0.07046 


0.06730 


1.094 


-0.19532 


0.06814 


38.05 


Pm 


-1.72277 


-1.72333 


0.10 


0.18378 


0.18398 


1.005 


-1.41729 


0.12568 


76.86 


p'll 


-1.25276 


-1.24578 


1.35 


0.16351 


0.15516 


1.138 


-0.92943 


0.09163 


111.59 


P21 


-0.84730 


-0.84306 


1.23 


0.10901 


0.10726 


1.060 


-0.80147 


0.08635 


16.78 





07 = 0.2 


a„ z = 0.1 


a A = 


0.35 a„ A 


= 0.175 


"bin = 


0.5 




-0.33647 


-0.35273 


5.28 


0.09732 


0.09866 


1.027 


-0.47684 


0.08066 


55.03 


0.06899 


0.07463 


1.83 


0.09727 


0.09114 


1.069 


-0.07992 


0.06535 


72.06 


-0.11333 


-0.11731 


1.36 


0.09237 


0.08829 


1.096 


-0.28307 


0.07064 


75.99 


-1.72277 


-1.71352 


1.12 


0.26019 


0.25957 


1.002 


-1.17318 


0.13081 


132.86 


-1.25276 


-1.23600 


2.09 


0.25323 


0.23734 


1.110 


-0.70824 


0.09678 


177.92 


-0.84730 


-0.84142 


1.25 


0.14808 


0.14385 


1.033 


-0.75047 


0.10000 


30.62 



0.4 



0.2 



ctX = 0.7 



• 0.35 



"bin 



1.0 



Poo 


-0.33647 


-0.37667 


8.27 


0.15372 


0.16437 


1.143 


-0.52454 


0.10316 


57.65 


P^o 

P20 


0.06899 


0.07901 


1.95 


0.16218 


0.16229 


1.001 


-0.20842 


0.08748 


100.28 


-0.11333 


-0.10270 


2.45 


0.13490 


0.14196 


1.107 


-0.36879 


0.09448 


85.51 


Pm 
p'll 


-1.72277 


-1.65260 


5.03 


0.44100 


0.46316 


1.103 


-0.91542 


0.13080 


195.18 


-1.25276 


-1.26323 


0.67 


0.49175 


0.49023 


1.006 


-0.56759 


0.11510 


188.24 


p'21 


-0.84730 


-0.85242 


0.65 


0.24892 


0.25306 


1.033 


-0.67546 


0.11284 


48.15 
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Table 6 

Real, ML method, and classic galaxy merger fraction 



m,fc $ gm,fc 

cr bin = 0.25 o- bin = 0.5 cr bin = 1.0 cr bin = 0.25 o- bin = 0.5 a bin = 1.0 



[0,0.4) 0.3333 0.337+0 °- 339 -o.042 °- 358 -ao98 0.423 ± 0.035 0.499 ± 0.038 0.575 ±0.040 

[0.4,0.8) 0.3478 0.348to o53 0.350+ q ^ 0.343+.g; 99 0.441 ± 0.027 0.516 ± 0.029 0.583 ± 0.035 

[0.8,1.2) 0.4897 0.490tg:o40 0.492±g;g|o 0.486±q%H 0.522 ±0.027 0.556 ± 0.030 0.595 ±0.035 
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